Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents
نویسندگان
چکیده
Identifying keyword associations from text and search sources is often used to facilitate many tasks such as understanding relationships among concepts, extracting relevant documents, matching advertisements to web pages, expanding user queries, etc. However, these keyword associations change as the underlying content changes with time. Two keywords that are associated with each other during one time period may not be associated in another time period or the context under which these keywords are associated may be different. In this paper, we define an equivalence relationship among a pair of keywords and develop methods to construct a temporal view of the equivalence relationship. Given a document set D, a keyword a is associated with a context consisting of frequently occurring keyword sets (fs) of D in which a appears. Two keywords a and b are equivalent in D if their contexts are the same. We say that a and b are temporally equivalent in a time interval if a and b are equivalent in the documents published during that time interval. Given a time-stamped document set D published over a time period T , we define the temporal equivalence partitioning problem to construct a partitioning of the time period T into a sequence of maximal length time intervals such that in each time interval keywords a and b are either temporally equivalent or the equivalence relationship does not hold. A temporal equivalence partitioning of a document set for a given pair of keywords highlights all of the different contexts in which the given keywords are associated which can be used to generate time-varying keyword suggestions to users. We show the effectiveness of the approach by constructing the temporal equivalence partitionings of several pairs of keywords from the Multi-Domain Sentiment data set and the ICWSM 2009 Spinn3r data set.
منابع مشابه
Extracting Temporal References to Assign Document Event-Time Periods
This paper presents a new approach for the automatic assignment of document event-time periods. This approach consists of extracting temporal information from document texts, and translating it into temporal expressions of a formal time model. From these expressions, we are able to approximately calculate the event-time periods of documents. The obtained event-time periods can be useful for bot...
متن کاملEntropy Based Measure Functions for Analyzing Time Stamped Documents
Measure functions that assign numeric values to keywords to capture their significance in a document set play a crucial role in the construction of a time decomposition of a document set. In this paper, we define two measure functions based on the notion of entropy. The interval entropy measure function identifies time intervals that have non-uniform keyword distributions and assigns high measu...
متن کاملIndexing Techniques for Temporal Text Containment Queries
Many information management systems maintain multiple time stamped versions of documents. The archives of web pages, version control systems, wikis and backup mechanisms are examples of such systems. For such temporally versioned document collections, a search using keywords along the temporal dimension is valuable. This paper studies the temporal dimension of keyword search in the context of t...
متن کاملExtracting keywords from email data using distributed word vectors
Current keyword extraction methods often use statistical models to select certain words from a set of documents, which fail to take advantage of the information available in the documents themselves. We propose a model for identifying semantic relationships between words in a document to identify keywords that more accurately capture the meaning of the document. Specifically, we use distributed...
متن کاملA Notion of Equivalence for Multimedia Documents
In this paper we aim at defining a notion of equivalence for multimedia documents which can be described according to different models. If we consider a presentation as a collection of media items and constraints among them, the same temporal behavior can be defined in different ways. In particular, using SMIL language, different sequences of tags can describe the same temporal behavior which c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011